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Linking Search Space Structure, Run-Time Dynamics, and 
Problem Difficulty: A Step Toward Demystifying Tabu 

Search 



Tabu search is one of the most effective heuristics for locating high-quaUty solutions to 
a diverse array of NP-hard combinatorial optimization problems. Despite the widespread 
success of tabu search, researchers have a poor understanding of many key theoretical 
aspects of this algorithm, including models of the high-level run-time dynamics and identi- 
fication of those search space features that influence problem difficulty. We consider these 
questions in the context of the job-shop scheduling problem (.TSP), a domain where tabu 
search algorithms have been shown to be remarkably effective. Previously, we demonstrated 
that the mean distance between random local optima and the nearest optimal solution is 
highly correlated with problem difficulty for a well-known tabu search algorithm for the 
JSP introduced by Taillard. In this paper, we discuss various shortcomings of this measure 
and develop a new model of problem difficulty that corrects these deficiencies. We show 
that Taillard's algorithm can be modeled with high fidelity as a simple variant of a straight- 
forward random walk. The random walk model accounts for nearly all of the variability 
in the cost required to locate both optimal and sub-optimal solutions to random JSPs, 
and provides an explanation for differences in the difficulty of random versus structured 
JSPs. Finally, we discuss and empirically substantiate two novel predictions regarding tabu 
search algorithm behavior. First, the method for constructing the initial solution is highly 
unlikely to impact the performance of tabu search. Second, tabu tenure should be selected 
to be as small as possible while simultaneously avoiding search stagnation; values larger 
than necessary lead to significant degradations in performance. 

1. Introduction 

Models of problem difficulty have excited considerable recent attention (Cheeseman, Kanef- 
sky, k Taylor, 1991; Clark, Frank, Gent, Maclntyre, Tomov, k Walsh, 1996; Singer, Gent, 
& Smaill, 2000). These models^ are designed to account for the variability in search cost 
observed for one or more algorithms on a wide range of problem instances and have yielded 



1. We refer to models of problem difEculty as cost models throughout this paper. 
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significant insight into the relationship between search space structure, problem difficulty, 
and algorithm behavior. 

In this paper, we investigate cost models of tabu search for the job-shop scheduling prob- 
lem (JSP). The JSP is an NP-hard combinatorial optimization problem and has become 
one of the standard and most widely studied problems in scheduling research. Tabu search 
algorithms are regarded as among the most effective approaches for generating high-quality 
solutions to the JSP (Jain &: Meeran, 1999) and currently represent the state-of-the-art by 
a comfortable margin over the closest competition (Nowicki h Smutnicki, 2005; Blazewicz, 
Domschke, & Pesch, 1996). While researchers have achieved considerable advances in per- 
formance since Taillard first demonstrated the effectiveness of tabu search algorithms for 
the JSP in 1989, comparatively little progress has been made toward developing an under- 
standing of how these algorithms work, i.e., characterizing the underlying high-level search 
dynamics, understanding why these dynamics are so effective on the JSP, and deducing how 
these dynamics might be modified to yield further improvements in performance. 

It is well-known that the structure of the search space influences problem difficulty for 
tabu search and other local search algorithms (Reeves, 1998). Consequently, a dominant 
approach to developing cost models is to identify those features of the search space that are 
highly correlated with search cost. We have performed extensive analyses of the relation- 
ship between various search space features and problem difficulty for Taillard's tabu search 
algorithm for the JSP (Watson, Beck, Howe, & Whitley, 2001, 2003). Our findings were 
largely negative: many features that are widely believed to influence problem difficulty for 
local search are, in fact, only weakly correlated with problem difficulty. Features such as the 
number of optimal solutions (Clark et al., 1996), the backbone size (Slaney &; Walsh, 2001), 
and the mean distance between random local optima (Mattfeld, Bierwirth, & Kopfer, 1999) 
account for less than a third of the total variability in search cost for Taillard's algorithm. 
In contrast, drawing from research on problem difficulty for local search and MAX-SAT 
(Singer et al., 2000), we found that the mean distance between random local optima and 
the nearest optimal solution, which we denote diopi-opi-, is highly correlated with problem 
difficulty, accounting for at least 2/3 of the total variability in search cost (Watson et al., 
2003). We further demonstrated that diopt-opt accounts for much of the variability in the 
cost of locating sub-optimal solutions to the JSP, and for differences in the relative difficulty 
of "square" versus "rectangular" JSPs. 

Nevertheless, the diopt-opt cost model has several shortcomings. First, the expense of 
computing diopi-opi limited our analyses to relatively small problem instances, raising con- 
cerns regarding scalability to more realistically sized problem instances. Second, residuals 
under the diopt-opt model are large for a number of problem instances, and the model is 
least accurate for the most difficult problem instances. Third, because the diopt-opt model 
provides no direct insight into the run-time behavior of Taillard's algorithm, we currently 
do not understand why diopt-opt is so highly correlated with search cost. 

We introduce a novel cost model that corrects for the aforementioned deficiencies of the 
diopt-opt cost model. This model, which based on a detailed analysis of the run-time behavior 
of Taillard's algorithm, is remarkably accurate, accounting for over 95% of the variability 
in the cost of locating both optimal and sub-optimal solutions to a wide range of problem 
instances. More specifically, we establish the following results: 
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1. Search in Taillard's algorithm appears to be effectively restricted to a sub-space Siopt+ 
of solutions that contains both local optima and solutions that are very close to 
(specifically, 1-2 moves away from) local optima. 

2. Taillard's algorithm can be modeled with remarkable fidelity as a variant of a sim- 
ple one-dimensional random walk over the Siopt+ sub-space. The walk exhibits two 
notable forms of bias in the transition probabilities. First, the probability of search 
moving closer to (farther from) the nearest optimal solution is proportional (inversely 
proportional) to the current distance from the nearest optimal solution. Second, 
search exhibits momentum: it is more likely to move closer to (farther from) the near- 
est optimal solution if search previously moved closer to (farther from) the nearest 
optimal solution. Moreover, the random walk model accounts for at least 96% of the 
variability in the mean search cost across a range of test instances. 

3. The random walk model is equally accurate for random, workflow, and flowshop JSPs. 
However, major differences exist in the number of states in the walk, i.e., the maximal 
possible distance between a solution and the nearest optimal solution. Differences in 
these maximal distances fully account for well-known differences in the difficulty of 
problem instances drawn from these various sub-classes. 

4. The accuracy of the random walk model transfers to a tabu search algorithm based 
on the powerful N5 move operator, which is more closely related to state-of-the-art 
tabu search algorithms for the JSP than Taillard's algorithm. 

5. The random walk model correctly predicts that initiating Taillard's algorithm from 
high-quality starting solutions will only improve performance if those solutions are 
very close to the nearest optimal solution. 

6. The random walk model correctly predicts that any tabu tenure larger than the mini- 
mum required to avoid search stagnation is likely to increase the fraction of the search 
space explored by Taillard's algorithm, and as a consequence yield a net increase in 
problem difficulty. Informally, the detrimental nature of large tabu tenures is often 
explained simply by observing that large tenures impact search through the loss of 
flexibility and the resulting inability to carefully explore the space of neighboring so- 
lutions. Our results provide a more detailed and concrete account of this phenomenon 
in the context of the JSP. 

The remainder of this paper is organized as follows. We begin in Sections 2 and 3 
with a description of the JSP, Taillard's algorithm, and the problem instances used in our 
analysis. The hypothesis underlying our analysis is detailed in Section 4. We summarize 
and critique prior research on problem difficulty for tabu search algorithms for the JSP 
in Section 5. Sections 6 through 9 form the core of the paper, in which we develop and 
validate our random walk model of Taillard's algorithm. In Section 10 we explore the 
applicability of the random walk model to more structured problem instances. Section 11 
explores the applicability of the random walk model to a tabu search algorithm that is 
more representative of state-of-the-art algorithms for the JSP than Taillard's algorithm. 
Section 12 details two uses of the random walk model in a predictive capacity. We conclude 
by discussing the implications of our results and directions for future research in Section 13. 
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2. Problem and Test Instances 

We consider the well-known n x m static, deterministic JSP in which n jobs must be 
processed exactly once on each of m machines (Blazewicz et al., 1996). Each job i (1 < i < 
n) is routed through each of the m machines in a pre-defined order TTj, where iTi{j) denotes 
the jth machine (1 < j < m) in the routing order of job i. The processing of job i on 
machine 'Ki{j) is denoted Oij and is called an operation. An operation Oij must be processed 
on machine for an integral duration Tjj > 0. Once initiated, processing cannot be 

pre-empted and concurrency on individual machines is not allowed, i.e., the machines are 
unit-capacity resources. For 2 < j < m, Ojj cannot begin processing until Oj(j_i) has 
completed processing. The scheduling objective is to minimize the makespan Cmax^ i-e., 
the completion time of the last operation of any job. Makespan-minimization for the JSP 
is A''P-hard for m > 2 and n > 3 (Garey, Johnson, & Sethi, 1976). 

An instance of the n x m JSP is uniquely defined by the set of nm operation durations 
Tij and n job routing orders ttj. We define a random JSP as an instance generated by (1) 
sampling the Tij independently and uniformly from an interval [LB^ UB] and (2) constructing 
the TTj from random permutations of the integer sequence = 1, . . . , m. Most often LB = 1 
and UB = 99 (Taillard, 1993; Demirkol, Mehta, &: Uzsoy, 1998). The majority of JSP 
benchmark instances, including most found in the OR Library^, are random JSPs. 

Non-random JSPs can be constructed by imposing structure on either the r^j, the tt^, 
or both. To date, researchers have only considered instances with structured yr-j, although 
it is straightforward to adapt existing methods for generating non-random Tij for fiow shop 
scheduling problems (Watson, Barbulescu, Whitley, &; Howe, 2002) to the JSP. One ap- 
proach to generating structured tt^ involves partitioning the set of m machines into wf 
contiguous, equally-sized subsets called workflow partitions. For example, when wf= 2, the 
set of m machines is partitioned into two subsets containing the machines 1 through m/2 
and m/2 + 1 through m, respectively. In such a two-partition scheme, every job must be 
processed on all machines in the first partition before proceeding to any machine in the 
second partition. No constraints are placed on the job routing orders within each partition. 
We refer to JSPs with wf = 2 simply as workflow JSPs. Less common are flowshop JSPs, 
where wf= m, i.e., all of the jobs visit the machines in the same pre-determined order. 

While the presence of structure often makes scheduling problems easier to solve (Watson 
et al., 2002), this is not the case for JSPs with structured tTj. Given fixed n and m, the 
average diflficulty of problem instances - as measured by the cost required to either locate 
an optimal solution or to prove the optimality of a solution - is empirically proportional to 
wf. In other words, random JSPs are generally the easiest instances, while fiowshop JSPs 
are the most difficult. Evidence for this observation stems from a wide variety of sources. 
For example, Storer et al. (1992) introduced sets of 50 x 10 random and workflow JSPs 
in 1992; the random JSPs were quickly solved to optimality, while the optimal makespans 
of all but one of the workflow JSPs are currently unknown. Similarly, the most difficult 
10 X 10 benchmark problems. Fisher and Thompson's infamous 10 x 10 instance and Apple- 
gate and Cook's (1991) orb instances, are all "nearly" workflow or flowshop JSPs, in that 
the requirement that a job be processed on all machines in one workflow partition before 
proceeding to any machine in the next workflow partition is slightly relaxed. 

2. http://www.brunel.ac.uk/depts/ma/research/jeb/orlib/jobshopinfo.html 



224 



Demystifying Tabu Search 



Accurate cost models of local search algorithms are generally functions of the set of 
globally optimal solutions to a problem instance (Watson et al., 2003: Singer et al., 2000), 
and the cost models we develop here further emphasize this dependency. Due in part to 
the computational cost of enumeration, our analysis is largely restricted to sets of 6 x 4 and 
6x6 random, workflow, and flowshop JSPs. Each set contains 1,000 instances apiece, with 
the Tij sampled from the interval [1, 99]. To assess the scalability of our cost models, we also 
use a set of 100 10 x 10 random JSPs, where the Tij are uniformly sampled from the interval 
[1,99]. In Section 10, we consider sets of 10 x 10 workflow and flowshop JSPs generated 
in an analogous manner. For comparative purposes with the literature, we additionally 
report results for various 10 x 10 instances found in the OR Library. Although the 10 x 10 
OR Library instances are no longer considered particularly challenging, they have received 
significant historical attention and serve to validate results obtained using our own 10 x 10 
problem set. For each instance in each of the aforementioned problem sets, a variant of 
Beck and Fox's (2000) constraint-directed scheduling algorithm was used to compute both 
the optimal makespan and the set of optimal solutions. 

3. The Algorithm: Tabu Search and the JSP 

Numerous tabu search algorithms have been developed for the JSP (Jain & Meeran, 1999). 
For our analysis, we select an algorithm introduced by Taillard in 1994. We implemented 
a variant of Taillard's algorithm, which we denote TSni"^, and easily reproduced results 
consistent with those reported by Taillard. TSni is not a state-of-the-art tabu search 
algorithm for the JSP; the algorithms of Nowicki and Smutnicki (2005), Pezzella and Merelli 
(2000), and Barnes and Chambers (1995) yield stronger overall performance. All of these 
algorithms possess a core tabu search mechanism that is very similar to that found in TSni , 
but differ in the choice of move operator, the method used to generate initial solutions, and 
the use of long-term memory mechanisms such as reintensification. 

Our choice of TSni is pragmatic. Before tackling more complex, state-of-the-art algo- 
rithms, we first develop cost models of a relatively simple but representative version of tabu 
search and then systematically assess the influence of more complex algorithmic features 
on cost models of the basic algorithm. Consequently, our implementation of TSni deviates 
from Taillard's original algorithm in three respects. First, we compute solution makespans 
exactly instead of using a computationally efficient estimation scheme. Second, we do not 
use frequency-based memory; Taillard (1994, p. 100) indicates that the benefit of such 
memory is largely restricted to instances requiring a very large number (i.e., > 1 million) of 
iterations. Third, we initiate trials of TSni from random local optima (using a scheme de- 
scribed below) instead of those resulting from Taillard's deterministic construction method. 
As discussed in Section 12.1, there is strong evidence that the type of the initial solution 
has a negligible impact on the speed with which TSni locates optimal solutions, which we 
take as the primary objective in our analysis. 



3. TSni is identical to the algorithm denoted TSj^g^^Hg^j.^ in our earlier paper (Watson et al., 2003); the 
new notation was chosen to better convey the fact that the algorithm deviates from Taillard's original 
algorithm in several respects and to emphasize the relative importance of the move operator. 
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TSni uses van Laarhoven et al.'s (1992) well-known Nl move operator, which swaps 
adjacent operations on critical blocks in the schedule.^ In our implementation, and in 
contrast to many local search algorithms for the JSP (Nowicki h Smutnicki, 1996), all 
pairs of adjacent critical operations are considered and not just those on a single critical 
path. At each iteration of TSni , the makespan of each neighbor of the current solution s 
is computed and the non-tabu neighbor ,s' G Nl (s) with the smallest makespan is selected 
for the next iteration; ties are broken randomly. Let {oij,Oki) denote the pair of adjacent 
critical operations that are swapped in s to generate s', such that appears before o^i in 
the processing order of machine niij). In the subsequent L iterations, TSni prevents or 
labels as "tabu" any move that inverts the operation pair {0^1, oij). The idea, a variant of 
frequency-based memory, is to prevent recently swapped pairs of critical operations from 
being re-established. The scalar L is known as the tabu tenure and is uniformly sampled 
every 1.2Lmax iterations from a fixed- width interval [Lmm, -^max]; such dynamic tabu tenures 
can avoid well-known cyclic search behaviors associated with fixed tabu tenures (Glover &: 
Laguna, 1997). Let sjes* denote the best solution located during any iteration of the current 
run or trial of TSni. When Cmax{s') < Cmax{sbest)^ the tabu status of s' is negated; in 
other words, TSni employs a simple aspiration level criterion. In rare cases, the minimal 
neighboring makespan may be achieved by both non-tabu and tabu-but-aspired moves, in 
which case a non-tabu move is always accepted. If all moves in Nl (s) are tabu, no move is 
accepted for the current iteration. We observe that in the absence of tabu moves preventing 
improvement of the current solution's makespan, TS^i acts as a simple greedy descent 
procedure. More specifically, it is clear that the core search bias exhibited by TS^i is 
steepest-descent local search, such that there is significant pressure toward local optima. 

Tabu tenure can have a major impact on performance. Based on empirical tests, Taillard 
defines L™„ = 0.8X andL^^ = 1.2X, where X = (n+m/2)-e-"/5™+7V/2-e-5™/" (Taillard, 
1994); n and m are respectively the number of jobs and machines in the problem instance 
and N = nm. In preliminary experimentation, we observed that the resulting tenure values 
for our 6x4 and 6x6 problem sets (respectively [3,5] and [4,6]) failed to prevent cycling 
or stagnation behavior. Instead, we set [Lmm, -^maj equal to [6, 14] for all trials involving 
these instances and re-sample the tabu tenure every 15 iterations. For 10 x 10 instances 
we set [Lmim Lmax] equal to [8, 14] for all trials and again re-sample the tabu tenure every 
15 iterations; the specific values are taken from Taillard's research, which also ignored the 
aforementioned rule for trials involving 10 x 10 instances (Taillard, 1994). Taillard's rules 
are used unmodified in all trials involving larger problem instances, e.g., those analyzed 
below in Section 4. 

3.1 Cost and Distance Metrics 

Unlike more effective JSP move operators such as N5 (Nowicki & Smutnicki, 1996), the Nl 
operator induces search spaces that are connected, in that it is always possible to move from 
an arbitrary solution to a global optimum. Consequently, it is possible to construct a local 
search algorithm based on Nl that is probabilistically approximately complete (PAC) (Hoos, 
1998), such that an optimal solution will eventually be located given sufficiently large run- 
times. Our experimental results suggest that TSni is PAC, subject to reasonable settings for 

4. Our notation for move operators is taken from Blazewicz (1996). 
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the tabu tenure; given our rules for selecting Lmm and Lmax- no trial of TSni failed to locate 
an optimal solution to any of the problem instances described in Section 2. In particular, 
the tabu tenure must be large enough for TSni to escape local optima; using short tabu 
tenures, it is straightforward to construct examples where TSj^i will become permanently 
trapped in the attractor basin of a single local optimum. To be provably PAC under general 
parameter settings, the TSni algorithm would likely require modifications enabling it to 
accept an arbitrary move at any given iteration, allowing search to always progress toward 
a global optimum; Hoos (1998) discusses similar requirements in the context of PAC local 
search algorithms for MAX-SAT. We have not pursued such modifications because they 
ignore practical efficiency issues associated with poor parameter value selection, and because 
it is unclear how the induced randomness would impact the core tabu search dynamics. 

The empirical PAC property enables us to naturally define the cost required to solve a 
given problem instance for a single trial of TSj^i as the number of iterations required to 
locate a globally optimal solution. In general, search cost under TSni is a random variable 
with an approximately exponential distribution, as we discuss in Section 9. Consequently, 
we define the search cost for a given problem instance as either the median or mean number 
of iterations required to locate an optimal solution; the respective quantities are denoted 
by Cq2 and c. We estimate both Cq2 and c using 1,000 independent trials. Due to the 
exponential nature of the underlying distribution, a large number of samples is required to 
achieve reasonably accurate estimates of both statistics. 

Our analysis relies on the notion of the distance D{si, S2) between two solutions si and 
S2, which we take as the well-known disjunctive graph distance (Mattfeld et al., 1999). 
Let ^{i,j,k,s) denote a predicate that determines whether job j appears before job k in 
the processing order of machine i of solution s. The disjunctive graph distance D{si,S2) 
between si and S2 is then defined as 

m n—1 n 
i—l j—1 k—j+1 

where the symbol © denotes the Boolean XOR operator. Informally, the disjunctive graph 
distance simply captures the degree of heterogeneity observed in the machine processing 
sequences of two solutions. A notable property of the disjunctive graph distance is that it 
serves as a lower bound, which is empirically tight, on the number of Nl moves required 
to transform si into .S2. This is key in our analysis, as computation of the exact number of 
Nl moves required to transform si into S2 is NP-hard (Vaessens, 1995). In the remainder 
of this paper, we use the terms "distance" and "disjunctive graph distance" synonymously. 

Finally, we define a "random local optimum" as a solution resulting from the application 
of steepest-descent local search under the Nl operator to a random semi-active solution. 
A semi-active solution is defined as a feasible solution (i.e., lacking cyclic ordering depen- 
dencies) in which all operations are processed at their earliest possible starting time. To 
construct random semi-active solutions, we use a procedure introduced by Mattfeld (1996, 
Section 2.2). The steepest-descent procedure employs random tie-breaking in the presence 
of multiple equally good alternatives and terminates once a solution s is located such that 
^s' e Nl{s),Cmax{s) <C 



227 



Watson, Whitley, & Howe 



4. The Run-Time Behavior of TSni Motivating Observations 

For a given problem instance, consider the space of feasible solutions S and the sub-space 
Siopt ^ S containing all local optima. Due to the strong bias toward local optima induced 
by the core steepest-descent strategy, we expect TSni to frequently sample solutions in Siopt 
during search. However, the degree to which solutions in Siopt f^rs actually representative of 
solutions visited by TSni is a function of both the strength of local optima attractor basins 
in the JSP and the specifics of the short-term memory mechanism. In particular, strong 
attractor basins would require TSni to move far away from local optima in order to avoid 
search stagnation. Recently, Watson (2003) showed that the attractor basins of local optima 
in the JSP are surprisingly weak in general and can be escaped with high probability simply 
by (1) accepting a short random sequence (i.e., of length 1 or 2 elements) of monotonically 
worsening moves and (2) re-initiating greedy descent. In other words, relatively small 
perturbations are sufficient to move local search out of the attractor basin of a given local 
optimum in the JSP. We observe that the aforementioned procedure provides an operational 
definition of attractor basin strength, e.g., a specific escape probability given a "worsening" 
random sequence of length k, in contrast to more informal notions such as narrowness, 
width, or diameter. 

Based on these observations, we hypothesize that search in TSni is effectively restricted 
to the sub-space Siopt+ D Siopt containing both local optima and solutions that are very 
close to local optima in terms of disjunctive graph distance or, equivalently, the number 
of Nl moves. To test this hypothesis, we monitor the descent distance of solutions visited 
by TSni during search on a range of random JSPs taken from the OR Library. We define 
the descent distance of a candidate solution s as the disjunctive graph distance D{s,s') 
between s and a local optimum s' generated by applying steepest-descent under the Nl 
operator to s. In reality, descent distance is stochastic due to the use of random tie- 
breaking during steepest-descent. We avoid exact characterization of descent distance for 
any particular s and instead compute descent distance statistics over a wide range of s. For 
each problem instance, we execute TSni for one million iterations, computing the descent 
distance of the current solution at each iteration and recording the resulting time-series; 
trials of TSni are terminated once an optimal solution is encountered and re-started from a 
random local optimum. Several researchers have introduced measures that are conceptually 
related to descent distance, but are in contrast based on differences in solution fitness. For 
example, search depth (Hajek, 1988) is defined as the minimal increase in fitness (assuming 
minimization) that must be accepted in order to escape a local optimum attractor basin. 
Similarly, Schuurmans and Southey (2001) define search depth as the difference between 
the fitness of a solution s and that of a global optimum s*. 

Summary statistics for the resulting descent distances are reported in Table 1; for com- 
parative purposes, we additionally report the mean descent distance observed for one million 
random semi-active solutions. The mean and median descent distance statistics indicate 
that TSni consistently remains only 1-2 moves away from local optima, independent of 
problem size. Although search is occasionally driven very far from local optima, such events 
are rare - as corroborated by the low standard deviations. Empirical evidence suggests that 
large-distance events are not due to the existence of local optima with very deep attractor 
basins, but are rather an artifact of TSni s short-term memory mechanism. These results 
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Descent Distance 
Under TSni 




Mean Descent Distance 
for Random Solutions 


Size 


Instance 


Median 


Mean 


Std. Dev. 


Max. 


10 X 10 


lal6 


1 


1.84 


1.78 


20 


12.43 


10 X 10 


lair 


1 


1.96 


1.86 


16 


13.03 


10 X 10 


lal8 


2 


2.06 


1.94 


18 


13.42 


10 X 10 


laig 


2 


2.15 


1.94 


18 


13.99 


10 X 10 


la20 


2 


2.22 


1.94 


20 


13.00 


10 X 10 


abz5 


2 


2.16 


1.95 


16 


13.96 


10 X 10 


abz6 


2 


2.12 


1.89 


16 


12.95 


15 X 15 


taOl 


1 


1.95 


1.97 


18 


24.38 


20 X 15 


tall 


1 


1.51 


1.83 


21 


30.07 


20 X 20 


ta21 


2 


2.28 


2.24 


24 


34.10 


30 X 15 


ta31 


1 


2.00 


2.20 


20 


39.27 


30 X 20 


ta41 


1 


1.68 


2.02 


36 


46.19 


50 X 15 


ta51 


1 


1.86 


2.35 


28 


45.11 


50 X 20 


ta61 


2 


2.25 


2.56 


60 


55.50 


100 X 20 


ta71 


1 


2.50 


3.16 


40 


71.26 



Table 1: Descent distance statistics for TS^i on select random JSPs from the OR Library. 

Statistics are taken over time-series of length one million iterations. Results for 
random solutions are computed using one million random semi-active solutions. 



support our hypothesis that search in TSmi is largely restricted to the sub-space of feasible 
solutions containing both local optima and solutions that are very close (in terms of dis- 
junctive graph distance) to local optima. Two factors enable this behavior: the strong bias 
toward local optima that is induced by the core steepest-descent strategy of TS^i and the 
relative weakness of attractor basins in the JSP. These factors also enable TS^i to ignore 
substantial proportions of the search space, as the mean descent distance under TS^i is 
only a fraction of the descent distance of random semi-active solutions. 

The primary purpose of short-term memory in tabu search is to enable escape from 
local optima (Glover &: Laguna, 1997). We recall that TS^i does not employ any long-term 
memory mechanism. Given (1) the absence of an explicit high-level search strategy and (2) 
the lack of any a priori evidence to suggest that TS^i is biased toward specific, e.g., high- 
quality, regions of Siopt+, we propose the following hypothesis: TSmi is simply performing a 
random walk over the Siopt+ sub-space. If true, problem difficulty should be correlated with 
|S';op<+|, the size of the Siopt+ sub-space. Implicit in this assertion is the assumption that 
the connectivity in Siopt+ is sufficiently regular such that optimal solutions are not isolated 
relative to the rest of the search space, e.g., only reachable through a few highly improbable 
pathways. Finally, we observe that the presence of multiple optimal solutions reduces the 
proportion of Siopt+ that must be explored, on average, before a globally optimal solution 
is encountered. Consequently, we focus instead on the effective size of Siopt+, which we 
denote by \Siopt+\'- For now, this measure is only an abstraction used to capture the notion 
that problem difficulty is a function of IS^opt+l and the number and distribution of globally 
optimal solutions within that sub-space. Specific estimates for \Siopt+\' are considered below 
in Sections 5 through 7. 
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5. Static Cost Models: A Summary and Critique of Prior Research 

With the exception of research on phase transitions in mean instance difficulty (Clark et al., 
1996), all existing cost models of local search algorithms are based on structural analyses of 
the underlying fitness landscapes. Informally, a fitness landscape is a vertex- weighted graph 
in which the verticies represent candidate solutions, the vertex weights represent the fitness 
or worth of solutions, and the edges capture solution adjacency relationships induced by a 
neighborhood or move operator (c.f.. Reeves, 1998; Stadler, 2002). 

Previously, we analyzed the relationship between various fitness landscape features and 
problem difficulty for TSni (Watson et al., 2001, 2003). We used regression methods 
to construct statistical models relating one or more (optionally transformed) landscape 
features, e.g., the logarithm of the number of optimal solutions, to the transformed cost 
logiQicQi) required to locate optimal solutions to 6 x 4 and 6x6 random .JSPs. Because 
they are based on static, time- invariant features of the fitness landscape, we refer to these 
models as static cost models. The accuracy of a static cost model can be quantified as 
the regression r^, i.e., the proportion of the total variability in search cost accounted for 
by the model. Most of the models we considered were based on simple linear regression 
over unitary features, such that the captured variability under the assumption of a linear 
functional relationship between landscape features and search cost. We found that the 
accuracy of static cost models based on well-known landscape features such as the number 
of optimal solutions (Clark et al., 1996), the backbone size (Slaney & Walsh, 2001), and 
the average distance between random local optima (Mattfeld et al., 1999) was only weak- 
to-moderate, with ranging from 0.22 to 0.54. Although not reported, we additionally 
investigated measures such as fitness-distance correlation (Boese, Kahng, & Muddu, 1994) 
and landscape correlation length (Stadler, 2002) - measures that are much more commonly 
investigated in operations research and evolutionary computing than in Al - and obtained 
even lower values. 

Drawing from research on MAX-SAT (Singer et al., 2000), we then demonstrated that a 
static cost model based on the mean distance between random local optima and the nearest 
optimal solution, which we denote diopt-opu is significantly more accurate, yielding values 
of 0.83 and 0.65 for 6 x 4 and 6x6 random .JSPs, respectively. As shown in the left side of 
Figure 1, the actual Cq2 for 6 x 6 random JSPs is typically within a factor of 10 (i.e., no more 
than 10 times and no less than 1/10) of the predicted Cq2, although in a few exceptional 
cases the observed differences exceed a factor of 100. Additionally, we showed that the 
diopi-opi model accounts for most of the variability in the cost required to locate sub-optimal 
solutions to these same problem instances and provides an explanation for differences in the 
relative difficulty of "square" (n/m ss 1) versus "rectangular" [n/m 2> 1) random .JSPs. 

For the variant of the diopt-opt measure associated with MAX-SAT, Singer (2000, p. 
67) speculates that instances with large diopt-opt are more difficult due to "... initial large 
distance from the [optimal] solutions or the extensiveness of the ... area in which the 
[optimal] solutions lie, or a combination of these factors." Building on this observation, we 
view diopt-opt as a concrete measure of \Siopt+\', specifically because diopt-opt simultaneously 
accounts for both the size of Siopt-^ and the distribution of optimal solutions within Siopt+- 
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Figure 1: Scatter-plots of diopt-opi versus Cq2 for 6 x 6 (left figure) and 10 x 10 (right figure) 
random JSPs; the least-squares fit lines are super-imposed. 



In doing so, we assume that random local optima are representative of solutions in Siopt-\-, 
which is intuitively justifiable given the low mean search depth observed under TSni 

In support of our view concerning the key role of \Siopi+\' in problem difficulty, we 
observe that the other static cost models of TSni considered by Watson and his colleagues 
(2001, 2003) are based on landscape features (the backbone size, the number of optimal 
solutions, or the average distance between local optima) that quantify either the size of 
Siopi+ or the number/distribution of optimal solutions, but not both. In other words, the 
underlying measures fail to capture one of the two key dimensions of \Siopt+\' ■ 

Despite its explanatory power, we previously identified several deficiencies of the diopt-opt 
cost model (Watson et al., 2003). First, the model is least accurate for the most difficult 
problem instances within a fixed-size group. Second, the model fails to account for a non- 
trivial proportion (~ 1/3) of the variability in problem difficulty for 6x6 random JSPs. 
Third, model accuracy fails to transfer to more structured workflow JSPs. 

5.1 An Analysis of Scalability 

Differences in the accuracy of the diopt-opt model on 6 x 4 and 6x6 random JSPs also 
raise concerns regarding scalability to larger, more realistically sized problem instances. 
Empirically, we have observed that the mean number of optimal solutions in random JSPs 
grows rapidly with increases in problem size. When coupled with the difficulty of "square" 
instances with n ~ m > 10, the resulting cost of computing both diopi.gpt and Cqj previously 
restricted our analysis to 6 x 4 and 6x6 random JSPs. However, with newer microprocessors, 
we are now able to assess the accuracy of the diopt-opt cost model on larger random JSPs. 

We compute diopt-opt for the 92 of our 100 10 x 10 random JSPs with < 50 million 
optimal solutions; the computation is unpractical for the remaining 8 instances. Estimates 

5. As discussed in Section 6, empirical data obtained during our search for more accurate cost models of 
TSni ultimately forces us to retract, or more precisely modify, this assumption. However, restrictions 
on the Siopt-^. sub-space still play a central role in all subsequent cost models. 
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of diopt-opt f-i's based on 5,000 random local optima. We show a scatter-plot of diopt-opt 
versus Cq2 for these problem instances in the right side of Figure 1. The value of the 
corresponding regression model is 0.46, which represents a 33% decrease in model accuracy 
relative to the 6x6 problem set. This result demonstrates the failure of the diopt-opt model 
to scale to larger JSPs. We observe similar drops in accuracy for static cost models based on 
the number of optimal solutions, the backbone size, and the mean distance between random 
local optima (Watson, 2003). Unfortunately, we cannot currently assess larger rectangular 
instances due to the vast numbers (i.e., tens of billions) of optimal solutions. 

6. Accounting for Search Bias: A Quasi-Dynamic Cost Model 

The deficiencies of the diopt-opt cost model indicate that either (1) diopt-opt is not an entirely 
accurate measure of IS/opi+l' or (2) our random walk hypothesis is incorrect, i.e., IS/opi+l' 
is not completely indicative of problem difficulty. We now focus on the first alternative, 
with the goal of developing a more accurate measure of \Siopt+\' than diopt-opt- Instead of 
random local optima, we instead consider the set of solutions visited by TSi^i during search. 
We refer to the resulting cost model as a quasi-dynamic cost model. The "quasi-dynamic" 
modifier derives from the fact that although algorithm dynamics are taken into account, an 
explicit model of run-time behavior is not constructed. 

We develop our quasi-dynamic cost model of TSni by analyzing the distances between 
solutions visited during search and the corresponding nearest optimal solutions. Let dopt{s) 
denote the distance between a solution s and the nearest optimal solution, i.e., dopi{s) = 
minxes* D{x, s) where S* denotes the set of optimal solutions. Let Xiabu denote the set 
of solutions visited by TS^i during an extended run on a given problem instance, and let 
Xriopt denote a set of random local optima. We then define dtabu-opt (diopt-opt) the mean 
distance dopt{s) between solutions s € Xtabu € X^iopt) and the nearest optimal solution. 

Figure 2 shows empirical distributions of dopt{s) for the X^iopt and Xtabu of two 10 x 
10 random JSPs. Both types of distribution are generally symmetric and Gaussian-like, 
although we infrequently observe skewed distributions both with and without heavier-than- 
Gaussian tails. Deviations from the Gaussian ideal are more prevalent in the smaller 6x4 
and 6x6 problem sets. In all of our test instances, dtabu-opt < diopt-opt^ TS^^ consistently 
visits solutions that on average are closer to an optimal solution than randomly generated 
local optima. Similar observations hold for solution quality, such that solutions in Xtabu 
consistently possess lower makespans than solutions in Xriopt- 

The histograms shown in Figure 2 serve as illustrative examples of two types of search 
bias exhibited by TSni - First, search is strongly biased toward solutions that are an "aver- 
age" distance between the nearest optimal solution and solutions that are maximally distant 
from the nearest optimal solution. Second, random local optima are not necessarily rep- 
resentative of the set of solutions visited during search, contradicting the assumption we 
stated previously in Section 5. Although search in TS^i is largely restricted to Siopt+, there 
potentially exist large portions of Siopt+ - for reasons we currently do not fully understand - 
that TSni is unlikely visit. Failure to account for these unexplored regions will necessarily 
yield conservative estimates of \Siopt+\' - We observe that these results do not contradict 
our random walk hypothesis; rather, we still assert that TS^i is performing a random walk 
over a potentially restricted sub-set of Siopt+- 
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Figure 2: Histograms of the distance to the nearest optimal solution for both random local 
optima and solutions visited by TSni for two example 10 x 10 random JSPs (each 
figure corresponds to a unique problem instance). 



We believe that the deficiencies of the diopt-opt model are due in large part to the failure of 
the underlying measure to accurately depict the sub-space of solutions likely to be explored 
by TSni ■ In contrast, the dtabu-opt measure by definition accounts for the set of solutions 
likely to be visited by TSni ■ Consequently, we hypothesize that a quasi-dynamic cost 
model based on the dtabu-opt measure should yield significant improvements in accuracy over 
the static diopt-opt cost model. As evidence for this hypothesis, we observe that although 
discrepancies between the distributions of dopt{s) for random local optima and solutions 
visited by TSni were minimal in our 6x4 problem sets, significant differences were observed 
in the larger 6x6 and 10 x 10 problem sets - the same instances for which the diopt-opt 
model is least accurate. To further illustrate the magnitude of the differences, we observe 
that for the 42 of our 10 x 10 random JSPs with < 100,000 optimal solutions, dtabu-opt is on 
average 37% lower than diopt-opt- For the same instances, the solutions in Xtabu on average 
possess a makespan 13% lower than those of solutions in Xriopi- 

We now quantify the accuracy of the dtabu-opt quasi-dynamic cost model on 6 x 4, 6 x 6, 
and 10 X 10 random JSPs. For any given instance, we construct Xtabu using solutions visited 
by TSni over a variable number of independent trials. A trial is initiated from a random 
local optimum and terminated once a globally optimal solution is located. The termination 
criterion is imposed because there exist globally optimal solutions from which no moves are 
possible under the Nl move operator (Nowicki &; Smutnicki, 1996). We terminate the entire 
process, including the current trial, once |Xia6„|=100,000. The resulting Xtabu are then used 
to compute dtabu-opt] the large number of samples is required to achieve reasonably accurate 
estimates of this statistic. 

Scatter-plots of dtabu-opt versus Cq2 for the 6x4 and 6x6 problem sets are respectively 
shown in the upper left and upper right sides of Figure 3. Regression models of dtabu-opt 
versus logiQ{cQ2) yield respective values of 0.84 and 0.78. corresponding to 4% and 20% 
increases in accuracy relative to the dinpi-npi. cost model. The actual Cq2 typically deviate 
from the predicted Cqj by no more than a factor of five and we observe fewer and less 
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Figure 3: Scatter-plots of diabu-opt versus search cost Cq2 for 6 x 4 (upper left figure), 6x6 
(upper right figure), and 10 x 10 (lower figure) random JSPs; the least-squares 
fit lines are super-imposed. 



extreme large-residual instances than under the diopt-opt model (only three out of 2,000 data 
points differ by more than a factor of 10.) 

For the set of 42 10 x 10 random JSPs with < 100,000 optimal solutions,^ a regression 
model of dtabu-opt versus ^og^o(cQ2) yields an value of 0.66 (see the lower portion of 
Figure 3); the computation of dfabu-opt is unpractical for the remaining instances. The 
resulting is 41% greater than that observed for the diopt-opt model on the same instances. 
Further, the actual Cqj is typically within a factor of 5 of the predicted Cqj and in no 
case is the discrepancy larger than a factor of 10. We have also annotated the scatter-plot 
shown in the lower portion of Figure 3 with data for those five of the seven 10 x 10 random 
JSPs present in the OR Library with < 100, 000 optimal solutions. The abz5 and lal9 

6. Our selection criterion does not lead to a clean distinction between easy and hard problem instances; 
the hardest 10 x 10 instance has approximately 1.5 million optimal solutions. However, instances with 
< 100,000 optimal solutions are generally more difficult, with a median cq2 of 65,710, versus 13,291 for 
instances with more than 100,000 optimal solutions. 
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instances have been found to be the most difficult in this set (Jain &; Meeran, 1999), which 
is consistent with the observed values of dtabu-opt- 

In conclusion, TS^i is highly unlikely to visit large regions of the search space for many 
problem instances. As a consequence, measures of \Siopt+\' based on purely random local 
optima are likely to be conservative and inaccurate, providing a partial explanation for the 
failures of the diopi.gpt model. In contrast, the dtabu-opt measure by definition accounts for this 
phenomenon, yielding a more accurate measure of \Siopi+\' and a more accurate cost model. 
However, despite the significant improvements in accuracy, the diopt-opt ^ind dtabu-opt do share 
two fundamental deficiencies: accuracy still fails to scale to larger problem instances, and 
the models provide no direct insight into the relationship between the underlying measures 
and algorithmic run-time dynamics. 

7. A Dynamic Cost Model 

The dinpi-opi and diabu-opL cost models provide strong evidence that TS^i is effectively per- 
forming a random walk over a potentially restricted subset of the Siopt+ sub-space. However, 
we have yet to propose any specific details, e.g., the set of states in the model or the quali- 
tative nature of the transition probabilities. The dynamic behavior of any memoryless local 
search algorithm, e.g., iterated local search (Lourenco, Martin, Sz Stiitzle, 2003) and simu- 
lated annealing (Kirkpatrick, Gelatt, &; Vecchi, 1983), can, at least in principle, be modeled 
using Markov chains: the set of feasible solutions is known, the transition probabilities 
between neighboring solutions can be computed, and the Markov property is preserved. 
Local search algorithms augmented with memory, e.g., tabu search, can also be modeled as 
Markov chains by embedding the contents of memory into the state definition, such that 
the Markov property is preserved. Although exact, the resulting models generally require 
at least (depending on the complexity of the memory) an exponential number of states - 
C'(2™-(2)) in the .JSP - and therefore provide little insight into the qualitative nature of the 
search dynamics. The challenge is to develop aggregate models in which large numbers of 
states are grouped into meta-states, yielding more tractable and consequently understand- 
able Markov chains. 

7.1 Definition 

To model the impact of short-term memory on the behavior of TSni , we first analyze how 
search progresses either toward or away from the nearest optimal solution. In Figure 4, we 
show a time-series of the distance to the nearest optimal solution for both a random walk 
under the Nl move operator and TS^i on a typical 10 x 10 random JSP. We obtain similar 
results on a sampling of random 6x4 and 6x6 JSPs, in addition to a number of structured 
problem instances. The random walk exhibits minimal short-term trending behavior, with 
search moving away from or closer to an optimal solution with roughly equal probability. 
In contrast, we observe strong regularities in the behavior of TS^i ■ The time-series shown 
in the right side of Figure 4 demonstrates that TSni is able to maintain search gradients 
for extended periods of time. This observation leads to the following hypothesis: the short- 
term memory mechanism of TSni acts to consistently bias search either toward or away 
from the nearest optimal solution. 
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Figure 4: Time-series of the distance to the nearest optimal solution for solutions visited 
by a random walk (left figure) and TSni (right figure) for a 10 x 10 random JSP. 



Based on this hypothesis, we define a state Sx,i in our Markov model of TSni as a 
pair representing both (1) the set of solutions distance i from the nearest optimal solution 
and (2) the current search gradient x. We denote the numeric values x G [—1,0,1] with 
the symbols closer, equal, and farther, respectively. In effect, we are modeling the impact of 
short-term memory as a simple scalar and embedding this scalar into the state definition. 
Next, we denote the maximum possible distance from an arbitrary solution to the nearest 
optimal solution by Dmax- Finally, let the conditional probability P{Sj^x'\Si,x) denote the 
probability of simultaneously altering the search gradient from x to x' and moving from 
a solution distance i from the nearest optimal solution to a solution distance j from the 
nearest optimal solution. The majority of these probabilities equal 0, specifically for any 
pair of states Sj^^' and Si^x where |« — j| > 1 or when simultaneous changes in both the 
gradient and the distance to the nearest optimal solution are logically impossible, e.g., from 
state Si^cioser to state Si^i^cioser- For each i, I < i < D^ax^ there exist at most the following 
nine non-zero transition probabilities: 

* Pi,^i—l,closer\Si,close'r)i Pi,Si^equal\Si^closer) i and P{^Sij^\ jo,rther\^i, closer) 

• P^Si—\^Qioser\Si^equal)^ PiSi^equal\Si^equal)i and P{Si^ijarther\Si^equal) 

• P{Si—l,closer\SiJarther)i P{Si^equal\SiJarther)i and P{Si-\-ijarther\Si,farther) 

The probabilities P{Sj^x'\Si,x) are also subject to the following total-probability constraints: 

* PiSi—i QiQggj'\Si QiQggj,) + P{Si gqjiai\Si QiQggj') + P{Si^ijaj,ifigj'\Si QiQggj') 1 

* P{^i—l,closer\^i, equal) ~l~ P{^i,equal\^i, equal) ~l~ P{^i+ljarther\^i, equal) 1 

• PiSi—l^closer\SiJarther) ~l~ PiSi^equall^iJarther) ~l~ Pi^i+ljartherl^i, farther) ~ 1 
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To complete the Markov model of TSmi , we create a reflecting barrier at i = Dmax and 
an absorbing state at i = by respectively imposing the constraints -P(S'o,c/oser-| •S'o, c/oser) = 1 
and P{SDma.-i,cioser\SD,na,jaHhcT) +P{S D^a.,equai\S D^a.Jarther) = 1- These Constraints yield 
three isolated states: So^eguah Sojarther, and SB^^,,closer- Consequently the Markov model 
consists of exactly 3 ■ Dmax states. 

We conclude by noting that an aggregated random walk model of TS^i (or any other 
local search algorithm) will not capture the full detail of the underlying search process. 
In particular, the partition induced by aggregating JSP solutions based on their distance 
to the nearest optimal solution is not lumpable (Kemeny & Snell, 1960); distinct solutions 
at identical distances to the nearest optimal solution have different transition probabilities 
for moving closer to and farther from the nearest optimal solution, due to both (1) unique 
numbers and distributions of infeasible neighbors and (2) unique distributions of neighbor 
makespans. Thus, the question we are posing is whether there exist sufficient regularities 
in the transition probabilities for solutions within a given partition such that it is possible 
to closely approximate the behavior of the full Markov chain using a reduced-order chain. 

7.2 Parameter Estimation 

We estimate the Markov model parameters Dmax and the set of P{Sj^x'\Si,x) by sampling a 
subset of solutions visited by TSni ■ For a given problem instance, we obtain at least Smm 
and at most Smax distinct solutions at each distance i from the nearest optimal solution, 
where 2 < i < rint{diopt-opt)-^ For the 6x4 and 6x6 problem sets, we let Smin = 50 
and Smax = 250; for the 10 x 10 set, we let Smin = 50 and Smax = 500. These values of 
Smin and Smax are large enough to ensure that artificially isolated states are not generated 
due to an insufficient number of samples. Individual trials of TSni are executed until a 
globally optimal solution is located, at which point a new trial is initiated. The process 
repeats until at least Smin samples are obtained for each distance i from the nearest optimal 
solution, 2 < i < rint{diopt-opt)j at which point the current algorithmic trial is immediately 
terminated. 

The upper bound Smax is imposed to mitigate the impact of solutions that are sta- 
tistically unlikely to be visited by TSni during any individual trial, but are nonetheless 
encountered with non-negligible probability when executing the large number of trials that 
are required to achieve the sampling termination criterion. Informally, Smax allows us to 
ensure that only truly representative solutions are included in the sample set. Candidate 
solutions are only considered for inclusion every 100 iterations for the smaller 6x4 and 6x6 
problem sets, and every 200 iterations for the larger 10 x 10 problem set. Such periodic 
sampling ensures that the collected samples are uncorrelated; the specific sampling intervals 
are based on estimates of the landscape correlation length (Mattfeld et al., 1999), i.e., the 
expected number of iterations of a random walk after which solution fitness is uncorrelated. 
Candidate solutions are accepted in order of appearance, i.e., the first Smin distinct solutions 
encountered at a given distance i are always retained, and are discarded once the number 
of prior samples at distance i exceeds Smax- For each sampled solution at distance i from 
the nearest optimal solution and search gradient x, we track the distance j and gradient x' 
for the solution in the subsequent iteration. 

7. The function rint{x) is defined as rint{x) = [a; -I- 0.5J, which rounds to the nearest integer. 
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Let i^{Si^x) and ii^{Sj^x'\Si,x) respectively denote the total number of observed samples 
in state Si^x and the total number of observed transitions from a state S-i^x to a state 
Sj^x'- Estimates of the transition probabilities are computed using the obvious formulas, 
e.g., P{Si-i^closer\Si,cioser) = #{Si-i,cioser\Si,cioser) /#{Si,cioser)- We frequently observe at 
least Smin samples for distances i > rint{diopt-opt) ■ To estimate D^ax^ we first determine the 
minimal X such that the number of samples at distance X is less than Smin, i-e., the smallest 
distance at which samples are not consistently observed. ^Ve then define Dmax — 
omission of states Si^x with i > X has negligible impact on model accuracy. Finally, we 
observe that our estimates of both the P{Sj^x'\Si,x) and Dmax are largely insensitive to both 
the initial solution and the sequence of solutions visited during the various trials, i.e., the 
statistics appear to be isotropic. 

The aforementioned process is online in that the computed parameter estimates are 
based on solutions actually visited by TSi^i . Ideally, parameter estimates could be derived 
independently of the algorithm under consideration, for example via an analysis of random 
local optima. However, two factors conspire to prevent such an approach in the JSP. First, 
as shown in Section 6, random local optima are typically not representative of solutions 
visited by TS^i during search, and we currently do not fully understand the root cause 
of this phenomenon (although preliminary evidence indicates it is due in large part to the 
distribution of infeasible solutions within the feasible space). Second, it is unclear how 
to realistically sample the contents of short-term memory. Consequently, we are currently 
forced to use TS^i to generate, via a Monte Carlo-like process, a representative set of 
samples. Further, we note that the often deterministic behavior of TS^i (discounting ties 
in the case of multiple equally good non-tabu moves and randomization of the tabu tenure) 
generally prevents direct characterization of the distribution of transition probabilities for 
any single sample, as is possible for local search algorithms with a stronger stochastic 
component, e.g., iterated local search or Metropolis sampling (Watson, 2003). 

In Figure 5, we show the estimated probabilities of moving closer to (left figure) or 
farther from (right figure) the nearest optimal solution for a typical 10 x 10 random JSP; 
the probability of maintaining an equal search gradient is negligible [p < 0.1), independent of 
the current distance to the nearest optimal solution. We observe qualitatively similar results 
for all of our 6 x 4, 6 x 6, and 10 x 10 random JSPs, although we note that results for most 
instances generally possess more noise (i.e., small-scale irregularities) than those observed in 
Figure 5. The results indicate that the probability of continuing to move closer to (farther 
from) the nearest optimal solution is typically proportional (inversely proportional) to the 
current distance from the nearest optimum. An exception occurs when i < 10 and the 
gradient is closer, where the probability of continuing to move closer to an optimal solution 
actually rises as i ^ 0. We currently have no explanation for this phenomenon, although it 
appears to be due in part to the steepest-descent bias exhibited by TSni ■ 

The probabilities of moving closer to/farther from the nearest optimal solution are, 
in general, roughly symniGtric ciroiind Dmax 

/2, such that search in TSnj is biased toward 
solutions that are an average distance from the nearest optimal solution. This characteristic 
provides an explanation for the Gaussian-like distributions of dopt observed for solutions 
visited during search, e.g., as shown in Figure 2. The impact of short-term memory is 
also evident, as the probability of maintaining the current search gradient is high and 
consistently exceeds 0.5 in all of the problem instances we examined, with the exception 
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Figure 5: The transition probabilities for moving closer to (left figure) or farther from (right 
figure) the nearest optimal solution under TS^i for a typical 10 x 10 random JSP. 



of occasional brief drops to no lower than 0.4 at extremal distances i, i.e., « or « 
Dmax- The probability of inverting the current gradient is also a function of the distance 
to the nearest optimal solution and the degree of change. For example, the probability 
of switching gradients from equal to closer is higher than the probability of switching from 
farther to closer. Consistent with the results presented above in Section 6, (1) the distance i 

at ^h\dl_P{Si^i^closer\SiMoser) = PiSi+lJarl.her\SiJarlher) is approximately equal to diabu-opl 

and (2) diopi.gpt generally falls anywhere in the range [Dmax/ '2, Dmax]- Finally, we note the 
resemblance between the transition probabilities in our Markov model and those in the 
well-known Ehrenfest model found in the literature on probability theory (Feller, 1968, p. 
377); in both models, the random walk dynamics can be viewed as a simple diffusion process 
with a central restoring force. 

7.3 Validation 

To validate the random walk model, we compare the actual mean search cost c observed 
under TSni with the corresponding value predicted by the model. We then construct 
a logiQ-loQiQ linear regression model of the predicted versus actual c and quantify model 
accuracy as the resulting r^. Because it is based on the random walk model of TSni, we 
refer to the resulting linear regression model as a dynamic cost model. Due to the close 
relationship between the random walk and dynamic cost models, we use the two terms 
interchangeably when identification of a more specific context is unnecessary. 

To compute the predicted c for a given problem instance, we repeatedly simulate the 
corresponding random walk model defined by the parameters Dmax, the set of states Si^x^ and 
the estimated transition probabilities P{Sj^x'\Si,x)- Each simulation trial is initiated from a 
state Sm,m where m = dopi{s) for a trial-specific random local optimum s and n equals closer 
or farther with equal probability: recall that the probability of maintaining an equal search 
gradient is negligible. We compute m exactly (as opposed to simply using rint{diopt-opt)) 
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in order to control for possible effects of the distribution of dgpt for random local optima, 
which tend to be more irregular (i.e., non-Gaussian) for small problem instances; letting 
m = rint{diopt-opt) results in a slight (< 5%) decrease in model accuracy. We then define 
the predicted search cost c as the mean number of simulated iterations required to achieve 
the absorbing state So^cioser] statistics are taken over 10,000 independent trials. 

We first consider results obtained for our 6x4 and 6x6 random JSPs. Scatter-plots of the 
predicted versus actual c for these two problem sets are shown in the top portion of Figure 6. 
The value for both of the corresponding Ioqiq-Ioqiq regression models is a remarkable 
0.96. For all but 21 and 11 of the respective 1,000 6 x 4 and 6x6 instances, the actual c is 
within a factor of 2 of the predicted c. For the remaining instances, the actual c deviates 
from the predicted value by a maximum factor of 4.5 and 3.5, respectively. In contrast to 
the diopt-opt and dtabu-opt cost models, there is no evidence of an inverse correlation between 
problem difficulty and model accuracy; if anything, the model is least accurate for the easiest 
problem instances, as shown in the upper left side of Figure 6. A detailed examination of 
the high-residual instances indicates that the source of the prediction error is generally 
the fact that TS^i visits specific subsets of solutions that are close to optimal solutions 
with a disproportionately high frequency, such that the primary assumption underlying 
our Markov model, i.e, lumpability, is grossly violated. As shown below, we have not 
yet observed this behavior in sets of larger random JSPs, raising the possibility that the 
phenomena is isolated. 

Next, we assess the scalability of the dynamic cost model by considering the 42 of our 
10 X 10 random JSPs with < 100,000 optimal solutions. A scatter-plot of the predicted 
versus actual c for these instances is shown in the lower portion of Figure 6: the value 
of the corresponding logi^-logiQ regression model is 0.97. For reference, we include results 
(labeled) for those 10 x 10 random JSPs from the OR Library with < 100,000 optimal 
solutions. The actual c is always within a factor of 2.1 of the predicted c, and there is no 
evidence of any correlation between accuracy and problem difficulty. More importantly, we 
observe no degradation in accuracy relative to the smaller problem sets. 

We have also explored a number of secondary criteria for validation of the dynamic 
cost model. In particular, we observe minimal differences between the predicted and actual 
statistical distributions of both (1) the distances to the nearest optimal solution and (2) the 
trend lengths, i.e., the number of iterations that consistent search gradients are maintained. 
Additionally, we consider differences in the distribution of predicted versus actual search 
costs below in Section 9. Finally, we note that the dynamic cost model is equally accurate 
(r^ > 0.96) in accounting for the cost of locating sub-optimal solutions to arbitrary 6x4 
and 6x6 random JSPs, as well as specially constructed sets of very difficult 6x4 and 6x6 
random JSPs. Both problem types are fully detailed by Watson (2003) . 

7.4 Discussion 

The results presented in this section provide strong, direct evidence for our hypothesis that 
search under TS^i acts as a variant of a straightforward one-dimensional random walk over 
the Siopt+ sub-space. However, the transition probabilities between states of the random 
walk are non-uniform, reflecting the presence of two specific biases in the search dynamics. 
First, search is biased toward solutions that are approximately equi-distant from the nearest 
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Figure 6: Scatter-plots of the predicted versus actual search cost c for 6 x 4 (upper left 
figure), 6x6 (upper right figure), and 10 x 10 (lower figure) random JSPs; the 
least-squares fit lines are super-imposed. 



optimal solution and solutions that are maximally distant from the nearest optimal solution. 
Consequently, in terms of random walk theory, the run-time dynamics can be viewed as a 
diffusion process with a central restoring force toward solutions that are an average distance 
from the nearest optimal solution. Second, TS^i 's short-term memory causes search to 
consistently progress either toward or away from the nearest optimal solution for extended 
time periods; such strong trending behavior has not been observed in random walks or in 
other memoryless local search algorithms for the JSP (Watson, 2003). Despite its central 
role in tabu search, our analysis indicates that, surprisingly, short-term memory is not 
always beneficial. If search is progressing toward an optimal solution, then short-term 
memory will increase the probability that search will proceed even closer. In contrast, 
when search is moving away from an optimal solution, short-term memory inflates the 
probability that search will continue to be led astray. Finally, we note that like diopt-opt and 
dtabu-oph Dmax is a Concrete measure of |5;opt+|'; all three measures directly quantify, with 
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varying degrees of accuracy, the width of the search space explored by TS^i . We further 
discuss the linkage between these measures below in Section 8. 

7.5 Related Research 

Hogs (2002) uses Markov models similar to those presented here to analyze the source of 
specific irregularities observed in the run-length distributions (see Section 9) of some lo- 
cal search algorithms for SAT. Because the particular algorithms investigated by Hoos are 
memoryless, states in the corresponding Markov chain model simply represent the set of 
solutions distance k from the nearest optimal (or more appropriately in the case of SAT, 
satisfying) solution. The transition probabilities for moving either closer to or farther from 
an optimal solution are fixed to the respective constant values and p"*" = indepen- 
dent of A;. By varying the values ofp~, Hoos demonstrates that the resulting Markov chains 
exhibit the same types of run- length distributions as well-known local search algorithms for 
SAT, including GWSAT and WalkSAT. Extensions of this model are additionally used to 
analyze stagnation behavior that is occasionally exhibited by these same algorithms. 

Our research differs from that of Hoos in several respects, the most obvious of which 
is the explicit modeling of TSnj 's short-term memory mechanism. More importantly, we 
derive estimates of both the transition probabilities and the number of states directly from 
instance-specific data. We then test the ability of the resulting model to capture the behav- 
ior of TSni on the specific problem instance. In contrast, Hoos posits a particular structure 
to the transition probabilities a priori. Then, by varying parameter values such as p~ and 
the number of model states, Hoos demonstrates that the resulting models capture the range 
of run-length distributions exhibited by local search algorithms for SAT; accuracy relative 
to individual instances is not assessed. 

Additionally, we found no evidence that the transition probabilities in the JSP are inde- 
pendent of the current distance to the nearest optimal solution. Given that (1) the solution 
representation underlying TSf^i and many other local search algorithms for the JSP is a 
Binary hypercube and (2) neighbors under the Nl operator are by definition Hamming 
distance 1 from the current solution, constant transition probabilities would be entirely 
unexpected from a theoretical standpoint (Watson, 2003). Finally, we have developed anal- 
ogous dynamic cost models for a number of memoryless local search algorithms for the JSP 
based on the Nl move operator, including a pure random walk, iterated local search, and 
Metropolis sampling (Watson, 2003). 

Finally, there are similarities between our notion of effective search space size {\Siopt+\') 
and the concept of a virtual search space size. Hoos (1998) observes that local search algo- 
rithms exhibiting exponentially distributed search costs (which includes TSni , as discussed 
in Section 9) behave in a manner identical to blind guessing in a sub-space of solutions con- 
taining both globally optimal and sub-optimal solutions. Under this interpretation, more 
effective local search algorithms are able to restrict the total number of sub-optimal solu- 
tions under consideration, i.e., they operate in a smaller virtual search space. Our notion of 
effective search space size captures a similar intuition, but is in contrast grounded directly 
in terms of search space analysis; in effect, we provide an answer to a question posed by 
Hoos, who indicates "ideally, we would like to be able to identify structural features of the 
original search space which can be shown to be tightly correlated with virtual search space 
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size" (1998, p. 133). Further, we emphasize the role of the number and distribution of 
globally optimal optimal solutions within the sub-space of sohitions under consideration, 
and directly relate run-time dynamics (as opposed to search cost distributions) to effective 
search space size (i.e., through Dmax)- 

8. The Link Between Search Space Structure and Run-Time Dynamics 

In transitioning from static to dynamic cost models, our focus shifted from algorithm- 
independent features of the fitness landscape to explicit models of algorithm run-time be- 
havior. By leveraging increasingly detailed information, we were able to obtain monotonic 
improvements in cost model accuracy. Discounting the difficulties associated with iden- 
tifying the appropriate types of information, a positive correlation between information 
quantity and cost model accuracy is conceptually unsurprising. Static cost models are 
algorithm-independent - a useful feature in certain contexts - and we anticipate a weak 
upper bound on the absolute accuracy of these models. Our quasi-dynamic dtabu-opi cost 
model is based on the same summary statistic as the static diopt-opt cost model; only the 
sample sets involved in computation of the statistic are different. In either case, the re- 
sulting cost models are surprisingly accurate, especially given the simplicity of the statistic. 
In contrast, a comparatively overwhelming increase in the amount of information appears 
to be required (as embodied in the dynamic cost model) to achieve further increases in 
accuracy. 

Perhaps more interesting than the correlation between information quantity and cost 
model accuracy is the nature of the relationship between the information underlying cost 
models at successive "levels", i.e., between static and quasi-dynamic models, or quasi- 
dynamic and dynamic models. Specifically, we argue that the parameters associated with a 
particular cost model estimate key parameters of the cost model at the subsequent higher 
level, exposing an unexpectedly strong and simple link between fitness landscape structure 
in the .JSP and the run-time dynamics of TSni ■ 

Recall from Section 7.2 that the estimated transition probabilities in the dynamic cost 
model are qualitatively identical across the range of problem instances and that most major 
differences are due to variability in Dmaxi the maximal observed distance to the nearest 
optimal solution. Further, we observe that the "closer" and "farther" transition probabili- 
ties are roughly symmetric around Dmax/'^- Consequently, search under TS^i is necessarily 
biased toward solutions that are approximately distance Dmax/'^ from the nearest optimal 
solution. More precisely, we denote the mean predicted distance to the nearest optimal so- 
lution by d dynamic-opt] simulation confirms that Hdynamic-opt « Dmaxl^, where any deviations 
are due primarily to the asymmetric rise in transition probability as i — >■ for gradients 
equal to closer. But dtabu-opt measures the mean distance between solutions visited dur- 
ing search and the nearest optimal solution, i.e., dtabu-opt ~ d dynamic-opt- Thus, we believe 
the success of the dtabu-opt model is due to the fact that (1) the transition probabilities in 
the dynamic cost model are qualitatively identical across different problem instances and 
(2) dtabu-opt indirectly approximates the key parameter D^ax of the dynamic cost model, 
via the relation dtabu-opt ~ d, dynamic-opt ~ D^axl'^- Discrepancies in the accuracy of the 
dynamic and quasi-dynamic cost models are expected, as no single measure is likely to 
capture the impact of subtle irregularities in the transition probabilities. The power of the 
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diopt-opt model is in turn due to the fact that diopt-opt ~ diabu-opt ^ but only for small problem 
instances. For larger problem instances, diopi.^pt consistently over-estimates dtabu-oph ^iid 
consequently Dmax, by failing to discount those regions of the search space that TSni is 
unlikely to explore. To conclude, the various cost models are related by the expression 
diopt-opt ~ dtabu-opt ~ ddynamic-opt ~ D^axl'^- The linkage between the models is due to the 
fact that these models all attempt, with varying degrees of accuracy, to quantify the effective 
size IS^opt+l' of the sub-space of solutions likely to be visited by TS^i during search. 

9. The Dynamic Cost Model and Run-Length Distributions 

The cost models developed in Sections 5-7 account for variability in central tendency mea- 
sures of problem difficulty, i.e., c and Cq2. In reality, search cost is a random variable C. 
Consequently, a cost model should ideally both qualitatively and quantitatively capture the 
full distribution of C. Because they are based on simple summary statistics, it is difficult 
to imagine how static and quasi-dynamic cost models might be extended to account for C. 
In contrast, a predicted C is easily obtained from the dynamic cost model; as discussed in 
Section 7.3, a predicted C is generated in order to compute c and is subsequently discarded. 
We now analyze the nature of the full C predicted by the dynamic cost model and determine 
whether it accurately represents the actual distribution of search costs under TSni ■ 

We follow Hoos (1998) in referring to the distribution C for a given problem instance 
as the run-length distribution (RLD). In what follows, we consider two specific questions 
relating to the RLDs of random JSPs under TS^i- (1) From what family of distributions 
are the RLDs drawn? and (2) Are the predicted and actual RLDs identically distributed? 
Both questions can be answered using standard statistical goodness-of-fit tests. Although 
the RLDs for TSni are discrete, we approximate the actual distributions using continuous 
random variables; the approximation is tight due to the wide range of search costs observed 
across individual trials, allowing us to avoid issues related to the specification of the bin size 
in the standard goodness-of-fit test for discrete random variables (e.g., such as those per- 
formed by Hoos). Instead, we use the two-sample Kolmogorov-Smirnov (KS) goodness-of-fit 
test, which assumes the existence of samples (in the form of cumulative density functions 
or CDFs) for distinct continuous random variables. The null hypothesis for the KS test 
states that the distributions underlying both samples are identically distributed. The KS 
test statistic quantifies the maximal distance between the two CDFs and the null hypothesis 
is rejected if the "distance" between them is sufficiently large (Scheaffer & McClave, 1990). 

We first consider the family of distributions from which the individual RLDs are drawn. 
Taillard (1994, p. 116) indicates that the number of iterations required to locate optimal 
solutions using his algorithm is approximately exponentially distributed. However, he only 
reported qualitative results for a single 10 x 10 problem instance. Using our set of 10 x 10 
random JSPs, we now perform a more comprehensive analysis. For each instance, we com- 
pute the two-sample KS test statistic for the null hypothesis that the RLD is exponentially 
distributed. The first sample consists of the actual search costs c observed over 1,000 in- 
dependent trials. The second sample consists of 1,000,000 random samples drawn from 
an exponential distribution with a mean c computed from the first sample; direct sam- 
pling from the theoretical CDF is required by the particular statistical software package 
we employed in our analysis. We show the distribution of the p-values associated with the 
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Figure 7: Left Figure: Distribution of the p-values for rejecting the null hypothesis that 
the RLDs of 10 x 10 random JSPs are exponentially distributed under TSni ■ 
Right Figure: The actual and exponential RLDs for the 10 x 10 instance with the 
smallest p-value. 



resulting KS test statistics in the left side of Figure 7. At p < 0.01, we reject the null 
hypothesis for 11 of the 100 instances, i.e., search cost under TS^i is not exponentially 
distributed in roughly 10% of the instances. In the right side of Figure 7, we show CDFs 
of both the actual RLD and the corresponding exponential RLD for the instance with the 
smallest p-value, or, equivalently, the largest deviation between the two CDFs. For this 
instance, and all other instances with p < 0.01, the two distributions differ primarily in 
their left tails. In particular, we observe far fewer low-cost runs than found in a purely 
exponential distribution. 

Our results reinforce Taillard's observation that the RLDs under TSni are approx- 
imately exponentially distributed. Exponential RLDs also arise in the context of local 
search algorithms for other A'^P-hard problems. For example, Hoos (1998, p. 118) reports 
qualitatively similar results for a range of local search algorithms (e.g., Walk-SAT) for MAX- 
SAT. Hoos additionally demonstrated that the deviation from the exponential "ideal" is a 
function of problem difficulty: RLDs of harder (easier) problem instances are more (less) 
accurately modeled by an exponential distribution. A similar relationship holds for the 
RLDs under TS^i ■ In Figure 8, we show a scatter-plot of the mean search cost c versus the 
value of the KS test statistic for 10 x 10 random JSPs. The data indicate that the value of 
the KS test statistic is inversely proportional to instance difficulty. More specifically, the 
RLDs under TS^i are approximately exponential for moderate-to-difficult instances, while 
the exponential approximation degrades for easier instances, e.g., as shown in the right side 
of Figure 7. Given significant differences between MAX-SAT and the JSP, our result raises 
the possibility of a more universal phenomenon. Finally, we note that Hoos also demon- 
strated that the RLDs of easy instances are well-approximated by a WeibuU distribution, 
a generalization of the exponential distribution. Although not reported here, this finding 
also translates to the RLDs of those instances shown in Figure 8 with p < 0.05. 
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Figure 8: Scatter-plot of mean search cost (c) versus the value of the Kolmogorov-Smirnov 
test statistic for comparing the actual search cost distribution with that of the 
corresponding exponential distribution. Large values of the test statistic indicate 
more significant differences. The horizontal lines indicate null hypothesis rejection 
thresholds at p < 0.01 and p < 0.05. 



Next, we analyze whether the RLDs predicted by the dynamic cost model can account 
for the actual RLDs observed for T5jVi . For each of our 10 x 10 random JSPs, we compute 
the two-sample KS test statistic for the null hypothesis that the predicted and actual RLDs 
originate from the same underlying distribution. The first sample consists of the actual c 
observed over 1,000 independent trials of TS^i- The second sample consists of the 10,000 
individual costs c used to estimate the predicted c (see Section 7.3). The discrepancy in 
the two sample sizes stems from the cost associated with obtaining individual samples. We 
only report results for the 42 instances for which estimation of the dynamic cost model 
parameters is computationally tractable. For all but 6 of the 42 instances (< 15%), we 
reject the null hypothesis that the two distributions are identical at p < 0.01; we found no 
evidence of any correlation between p and problem difficulty. In other words, despite the 
success of the dynamic cost model in accounting for c, it generally fails to account for the 
full RLD. Yet, despite statistically significant differences, the predicted and actual RLDs 
are generally qualitatively identical. For example, consider the predicted and actual RLDs 
for the two instances yielding the smallest and largest p-values, as shown in Figure 9. In 
the best case, the two distributions are effectively identical. In the worst case, the two 
distributions appear to be qualitatively identical, such that the actual RLD can be closely 
approximated by shifting the predicted RLD along the x-axis. 

In further support of this observation, we more carefully consider the 36 instances in 
which the differences between the actual and predicted RLDs are statistically significant at 
p < 0.01. For each instance, we compute the KS test statistic for the difference between the 
predicted RLD and an exponential distribution with mean equal to the predicted c. For 
all difficult instances, specifically those with a predicted c > 10, 000, we fail to reject the 
null hypothesis that the underlying distributions are identical at p < 0.01. Consequently, if 
the dynamic cost model were able to accurately predict c, the predicted and actual RLDs 
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Figure 9: CDFs of the predicted and actual RLDs for two 10 x 10 random JSPs. The 
p- values for the KS test statistic are respectively 0.73 and 3.4 x 10~^^. 



would be statistically indistinguishable. For the remaining easy and medium instances, 
we do observe statistically significant differences at p < 0.01 between the predicted and 
corresponding exponential RLDs. However, any differences are isolated to the left tails of 
both distributions, such that the predicted RLDs under-cut the corresponding exponential 
RLDs, e.g., as shown in the right side of Figure 7. In other words, the RLDs predicted 
by the dynamic cost model capture those deviations from the exponential form that are 
observed in the RLDs of TSni on easy and medium difficulty instances. 

Given (1) the well-known difficulty of accurately estimating the mean of an exponential 
or exponential- like distribution and (2) the fact that any "lumped" model of TSni will 
necessarily fail to capture the full detail of the true underlying Markov chain, our results 
provide strong evidence in support of the hypothesis that any observed differences between 
the predicted and actual RLDs are not indicative of any major structural flaw in the dynamic 
cost model. 

10. On the Difficulty of Structured JSPs 

For fixed n and m, the mean difficulty of JSP instances is known to increase as the number of 
workflow partitions wf is varied from 1 (corresponding to random JSPs) to m (corresponding 
to flowshop JSPs), i.e., as more structure is introduced. For example, the mean Cq2 under 
TSfii observed for our 6x6 random, workflow, and flowshop JSPs are 280, 3,137, and 
12,127, respectively. These differences represent an order-of-magnitude increase in average 
difficulty as wf is varied from 1 to m/2 and again from m/2 to m. Similar differences 
are observed for 10 x 10 random, workflow, and flowshop JSPs, where the mean Cq2 are 
respectively 315,413, 4.35 x 10^, and 2.62 x 10^. The often extreme difficulty of structured 
JSPs is further illustrated by the fact that the most difficult 10 x 10 flowshop JSP required 
an average of 900 million iterations of TS^i to locate an optimal solution. 
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Cost Model 


Problem Size 


Structure 


^lopt-opt 


(^tabu-opt 


6x4 


Random 


0.80 


0.84 


Workflow 


0.62 


0.76 


Flowshop 


0.70 


0.72 


6x6 


Random 


0.64 


0.78 


Workflow 


0.30 


0.55 


Flowshop 


0.41 


0.55 



Table 2: The values of the diopt-opt and dtabu-opt cost models of TSni obtained for 6 x 4 
and 6x6 random, workflow, and flowshop JSPs. 



We previously reported that the accuracy of the diopt-opt cost model fails to transfer 
from random .JSPs to workflow JSPs (Watson et al., 2003); additional experiments yield 
similar results for flowshop .JSPs. Table 2 shows the values of the diopt-opi and diabu-opt 
cost models for 6 x 4 and 6x6 random, workflow, and flowshop JSPs. The results in- 
dicate that the diopt-opt model is more accurate on random JSPs than on either workflow 
and flowshop JSPs, although accuracy improves when transitioning from workflow to flow- 
shop JSPs. The dtabu-opt model is more accurate than the diopt_opt model on both types of 
structured JSP. However, despite the absolute improvements relative to the diopt-opt model, 
accuracy of the dtabu-opt model decreases with increases in wf. Overall, the dtabu-opt model 
accounts for slightly over half of the variability in problem difficulty observed in the more 
difficult structured 6x6 JSPs. The significant difference in accuracy {k, 30%) relative to 
random JSPs raises the possibility that the dynamic cost model may be unable to correct 
for the deficiencies of the dtabu-opt model. In particular, factors other than IS/opi+l' may be 
contributing to the difficulty of structured JSPs, or it may not be possible to model the 
behavior of TS^i on structured instances as a simple random walk. 

Consider the transition probabilities under the dynamic cost model for 6 x 4 and 6x6 
workflow and flowshop JSPs, obtained using the sampling methodology described in Sec- 
tion 7.2. Figure 10 shows the probabilities of TS^i continuing to transition closer to the 
nearest optimal solution for two 6x6 flowshop instances. For the instance corresponding 
to the left-hand figure, the transition probabilities are roughly proportional to the current 
distance from the nearest optimal solution, which is consistent with the results observed 
for random JSPs, e.g., as shown earlier in Figure 5. In contrast, for the instance corre- 
sponding to the right-hand figure, the probability of TS^i continuing to move closer to the 
nearest optimal solution is effectively constant at « 0.6. These examples illustrate a key 
point regarding the behavior of TS^i on structured JSPs: the transition probabilities are 
significantly more heterogeneous than those observed for random JSPs, often deviating sig- 
nificantly from the "prototypical" (i.e., symmetric around Dmaxf^) form in both qualitative 
and quantitative aspects. Given such large deviations, it is unsurprising that cost models 
based on simple summary statistics, specifically diabu-opt-, fail to account for a substantial 
proportion of the variability in the difficulty of structured JSPs. 
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Figure 10: The transition probabilities for moving closer to the nearest optimal solution 
under TS^i for distinct 6x6 flowshop JSPs. 



Despite differences in the transition probabilities relative to those observed for random 
JSPs, we observe no impact on the overall accuracy of the dynamic cost model. For 6x4 
and 6x6 workflow JSPs, the values corresponding to a Ioqiq — Ioqiq regression of the 
predicted versus actual c are respectively 0.97 and 0.95. The analogous for both 6x4 
and 6x6 flowshop JSPs is 0.96. For all but a few exceptional instances, the actual c is 
always within a factor of 3 of the predicted c; in no case does the difference exceed a factor 
of 4. Similar results are obtained on a small set of randomly generated 10 x 10 workflow 
and flowshop JSPs for which transition probability estimation is computationally feasible. 

The dynamic cost model accurately captures the run-time dynamics of TS^i on both 
random and structured JSPs, although the transition probabilities are qualitatively different 
in the two types of problem. Heterogeneity in the transition probabilities of structured JSPs 
additionally indicates that variability in the difficulty of these instances is unlikely to be 
captured by simple summary statistics, yielding reductions in the accuracy of our static and 
quasi-dynamic cost models. Finally, we observe that it is still possible that the accuracy of 
the dynamic cost model may degrade significantly under fundamentally different types of 
structure than those considered here, e.g., with strong correlation between subsets of the 
operation durations Tij. 

Mattfeld et al. (1999) indicate that differences in the difficulty of random and workflow 
JSPs are due to differences in the size of the search spaces, as measured by the mean 
distance between random local optima. The observed to 
a nearest optimal solution increases when moving from random JSPs to workflow JSPs, 
and again from workflow JSPs to flowshop JSPs; results for 6 x 4 and 6x6 problem sets 
are shown in Table 3. Consequently, our results serve to clarify Mattfeld et al.'s original 
assertion: given that D^ax is a measure of \Siopt+\' , differences in the difficulty of random 
and structured JSPs are simply due to differences in the effective size of the search space. 
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Structure Type 


Problem Size 


Random 


Workflow 


Flowshop 


6x4 


21.46 


37.01 


44.80 


6x6 


26.67 


47.80 


68.86 



Table 3: The mean maximal distance to the nearest optimal solution {Dmax) observed for 
6x4 and 6x6 random, workflow, and flowshop JSPs. 



11. Moving Toward Cost Models of State-of-the-Art Algorithms 

As discussed in Section 3, TS^i is a relatively straightforward implementation of tabu search 
for the JSP. In particular, it lacks features such as advanced move operators and long-term 
memory mechanisms that have been demonstrated to improve the performance of tabu 
search algorithms for the JSP. Given an accurate cost model of TSni , the next logical step 
is to systematically assess the impact of these algorithmic features on model structure and 
accuracy. Ultimately, the goal is to incrementally move both the target algorithm and the 
associated cost model toward the state-of-the-art, e.g., as currently represented by Nowicki 
and Smutnicki's (2005) i-TSAB algorithm. We now take an initial step toward this goal 
by demonstrating that a key performance-enhancing component - the powerful N5 move 
operator - fails to impact either the structure or accuracy of the dynamic cost model. 

Recall from Section 3 that the neighborhood of a solution s under the Nl move operator 
consists of all solutions obtained by inverting the order of a pair of adjacent operations 
on the same critical block. Let s' G Nl (s) denote the solution obtained by inverting 
the order of two adjacent critical operations Oij and Oj^i in s. If both Oij and Oj^i are 
contained entirely within a critical block, i.e., neither operation appears first or last in 
the block, then Cmax{s') > Cmax{s) (Mattfeld, 1996). In other words, many moves under 
Nl provably cannot yield immediate improvements in the makespan of the current solution 
and therefore should not be considered during search. Building on this observation, Nowicki 
and Smutnicki (1996) introduce a highly restricted variant of the Nl move operator, with 
the goal of accelerating local search by reducing the total cost of neighborhood evaluation. 
This operator, which we denote N5, contributes significantly to the performance of both the 
well-known TSAB algorithm (Nowicki & Smutnicki, 1996) and the current state-of-the-art 
algorithm for the JSP, 2-TSAB (Nowicki &; Smutnicki, 2005). However, the power of the 
N5 operator comes with a price: the induced search space is disconnected, such that it 
is not always possible to move from an arbitrary solution to a globally optimal solution. 
Consequently, no local search algorithm based strictly on N5 can be PAC in the theoretical 
sense. Further, as discussed below, empirical PAC behavior for basic tabu search algorithms 
based on the N5 operator is not possible for random JSPs. 

The lack of the PAC property significantly complicates the development of cost models, 
as it is unclear how to quantify search cost or, equivalently, problem difficulty. This fact, 
in large part, drove our decision to base our research on the TS^i algorithm. Recently, 
we demonstrated that for random JSPs the N5 operator can induce small "traps" or iso- 
lated sub-components of the search space from which escape is impossible (Watson, 2003). 
Fortunately, these traps are easily detected and can be escaped via a short random walk 
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Figure 11: Left Figure: Scatter-plot of the actual search cost c under the TS^s versus TSni 
algorithms; the line y = a; is super-imposed. Right Figure: Scatter-plot of the 
predicted versus actual search cost c for 10 x 10 random JSPs using TSns; the 
least-squares fit line is super-imposed. 



under the more general (i.e., connected) Nl move operator. Given these observations, we 
introduced a novel AT^-based tabu search algorithm for the JSP that is empirically PAC 
on all of our 6 x 4, 6 x 6, and 10 x 10 sets of random JSPs. The algorithm, which we 
denote TSn5, is detailed by Watson (2003). However, with the exception of both the move 
operator and the trap detection/escape mechanisms, TSni and TSns are identical. We 
further observe that traps are infrequently encountered (typically at most once every 5K or 
more iterations), and we use identical settings for all parameters found in both algorithms, 
i.e., Ljnin and Lmax- Consequently, we are now able to characterize the impact of the N5 
move operator on the effectiveness of tabu search for the JSP under controlled experimental 
conditions. 

Consider the mean search cost c under both TSni and TSns on our 10 x 10 random 
JSPs; the corresponding scatter-plot is shown in the left side of Figure 11. We observe un- 
expectedly low correlation between problem difficulty under the two algorithms; differences 
of a factor of 10 are common and reach nearly a factor of 100 in the worst case. Due to 
the low frequency of occurrence, a minimal proportion of the observed differences can be 
attributed to the trap detection and escape mechanisms. The implication is that the move 
operator can dramatically alter the cost required by tabu search to locate optimal solutions 
to random JSPs. In many cases, the effect is actually detrimental in that the mean number 
of iterations required under TSns can be significantly larger than that required under TSni ■ 
However, the number of neighbors under the Nl operator commonly exceeds that under 
the N5 operator by a factor of 10 or more, especially on larger problem instances, masking 
any detrimental effects in the vast majority of cases. As a result, TSns consistently locates 
optimal solutions in lower overall run-times on average than TSni ■ 

Large differences in the observed c under TSni and TSns are necessarily indicative of 
differences in the underlying run-time dynamics. We now consider whether the differences 
are truly qualitative or merely quantitative. In other words, is the random walk model 
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proposed in Section 7 no longer applicable, or can the differences be explained in terms of 
changes in model parameters such as D^ax and the transition probabilities P{Sj_x'\Si^x)'^ 
To answer this question, we first compute estimates of the dynamic cost model parameters 
using the sampling methodology described in Section 7.2, with one exception: due to their 
relative rarity, we do not attempt to capture the random walk events associated with trap 
escape. To ensure tractability, we consider only the 42 of our 10 x 10 random JSPs with < 
100,000 optimal solutions. 

The resulting transition probabilities are more irregular than those observed under TSni 
on the same problem instances, mirroring the results obtained for structured JSPs described 
in Section 10. Additionally, we frequently observe large discrepancies in D^ax under TSni 
and TSn5, which, in part, accounts for the observed discrepancies in c. Using the resulting 
parameter estimates, we compute the predicted c and compare the results with the actual c 
observed using TSns', the corresponding scatter-plot is shown in the right side of Figure 11. 
The value of the corresponding Ioqiq-Ioqiq regression model is 0.96, and in all cases the 
actual c deviates from the predicted c by no more than a factor of 5. Overall, these results 
demonstrate that the N5 move operator has negligible effect on the absolute accuracy of 
the dynamic cost model; as in TSni, search in TSns simply acts as a biased random walk 
over the Siopt+ sub-space. Consequently, the dynamic cost model is an appropriate basis 
for a detailed analysis of 'i-TSAB or related algorithms, which differ from TSns primarily 
in the use of long-term memory mechanisms such as reintensification and path relinking 
(Nowicki & Smutnicki, 2005). 

12. Exploring the Predictive Capability of the Dynamic Cost Model 

Thus far, our primary goal has been to explain the source of the variability in the cost of 
locating optimal solutions to random JSPs using TSni ; the dynamic cost model introduced 
in Section 7 largely achieves this objective. Despite this success, however, we have only 
illustrated the explanatory power of the model. Ideally, scientific models are predictive, 
in that they lead to new conjectures concerning subject behavior and are consequently 
falsifiable. We next use the dynamic cost model to propose and empirically confirm two novel 
conjectures regarding the behavior of TSni on random JSPs. Our analysis demonstrates 
that the utility of cost models can extend beyond after-the-fact explanations of algorithm 
behavior. 

12.1 The Variable Benefit of Alternative Initialization Methods 

Empirical evidence suggests that high-quality initial solutions can improve the performance 
of tabu search algorithms for the JSP (Jain, Rangaswamy, &; Meeran, 2000). Yet, both 
the exact conditions under which improvements can be achieved and the expected degree 
of improvement are poorly understood. We now explore a particular aspect of this broader 
issue by considering the question: What impact do different initialization methods have 
on the cost required by TSni to locate optimal solutions to random JSPs? The preceding 
analyses of TSni are based on the assumption that search is initiated from a random local 
optimum. Here, we instead consider the behavior predicted by the dynamic cost model 
when search is initiated from solutions other than random local optima. 
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Let Vi denote the predicted mean search cost required to locate an optimal solution 
imder the dynamic cost model when search is initiated from solutions that are distance i 
from the nearest optimal solution. As in Section 7, we assume the initial gradient x equals 
closer or farther with equal probability. Ignoring potential asymmetries in the distribution of 
dopt{s) for random local optima s, observe that the predicted c is approximately equal to 
where S = rint{diopt-opt) ^ i.e., TS^i is initiated from a local optimum that is an average 
distance diopt-opt from the nearest optimal solution. We address the issue of the impact of 
alternative initial solutions on the performance of TSni by analyzing the nature of Vi for 
i ^ 6. In Figure 12, we show plots of the predicted costs Vi over a wide range of « for specific 
6x6 (left figure) and 10 x 10 (right figure) random JSPs; results for i < 2, where Vi <C 100, 
are omitted for purposes of visualization. For the 6x6 instance, search cost rises rapidly 
between i = 3 and i = 10 and continues to gradually increase as i ^ Dmax- In contrast, 
search cost for the 10 x 10 instance rises rapidly between i = 2 and i 15 but is roughly 
constant, modulo the sampling noise, once i > 15. Even when « = 3, the dynamic model 
predicts that search cost is still significant: if the initial search gradient is not closer, search 
is rapidly driven toward solutions that are distant from the nearest optimal solution and any 
benefit of the favorable initial position is lost. We observe qualitatively identical behavior in 
a large sample of our random JSPs, arriving at the following general observation: for easy 
(hard) instances, the approach toward an asymptotic value of Vi as i ^ Dmax is gradual 
(rapid). Consequently, we hypothesize that a particular initialization method will at best 
have a minimal impact on the performance of TS^i unless the resulting solutions are very 
close to the nearest optimal solution. Additionally, we observe that the dynamic cost model 
predicts that the distance to the nearest optimal solution, and not solution fitness, dictates 
the benefit of a given initialization method. The distinction is especially key in the JSP, 
where fitness-distance correlation is known to be comparatively weak, e.g., in contrast to 
the Traveling Salesman Problem (Mattfeld, 1996). 

To test this hypothesis, we analyze the performance of TSni using a variety of heuristic 
and random methods to generate initial solutions. Following Jain et al. (2000), we consider 
the following set of high-quality priority dispatch rules (PDRs) used in conjunction with 
Giffler and Thompson's (1960) procedure for generating active solutions^: 

• FCFS (First-Come, First-Serve), 

• LRM (Longest ReMaining work) , 

• MWKR (Most WorK Remaining), and 

• SPT (Shortest Processing Time). 

Additionally, we consider both active and non-delay solutions (Giffler Sz Thompson, 1960) 
generated using random PDRs, respectively denoted RNDactv and RNDndiy Finally, we 
include Taillard's (1994) lexicographic construction method, denoted LEX, and the insertion 
procedure introduced by Werner and Winkler (1995), which we denote WW; the latter is 
one of the best constructive heuristics available for the random JSP (Jain et al., 2000). 
Random semi-active solutions serve as a baseline and are denoted RNDgemi- The solutions 

8. Technically, there is a formal difference between a solution and a schedule in the JSP. However, because 
we are assuming earliest start-time scheduling of all operations, we use the two terms interchangeably. 
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Figure 12: Plots of the distance i between an initial solution and the nearest optimal solu- 
tion and the predicted search cost w j for a 6 x 6 (left figure) and a 10 x 10 (right 
figure) random JSP. The 6 = rint{diopt-opt) for these two instances are 29 and 
65, respectively. 



resulting from all methods are transformed into local optima by applying steepest-descent 
under the Nl operator. 

We again consider only the 42 10 x 10 random JSPs with < 100,000 optimal solutions; 
we selected the larger - as opposed to 6x4 or 6x6 - problem set due to the higher 
degree of expected difficulty. For each initialization method, we compute diopt-opt for each 
problem instance using (with the exception of LEX, which is deterministic) 5,000 local 
optima generated by applying steepest-descent search to the solutions resulting from the 
given method. For RNDgemi, "we obtain a mean diopt.gpt, averaged over all 42 instances, of 
approximately 70.92. We show the mean diopt-opt for each remaining initialization method in 
Table 4; p- values for the statistical significance of the mean difference in diopt-opt between the 
various methods and RNDgemh computed using a Wilcoxon non-parametric, paired-sample 
signed rank test, are also provided. With the exception of SPT, we observe significant 
differences in diopt-opt between the baseline RNDgemi solutions and those resulting from other 
initialization methods. Initially, these data suggest that it may be possible to improve 
the performance of TS^i using initialization methods with low diopt-opt- However, the 
lowest mean values of diopt-opt, obtained using the LEX and WW methods, are still large 
in absolute terms. Consequently, given a combination of our working hypothesis and the 
average difficulty of 10 x 10 instances (e.g., see the right side of Figure 12), it seems likely 
that even these solutions are still too far from the nearest optimal solution to impact overall 
search cost. 

For each problem instance, we next compute the Cq2 under each initialization method; 
statistics are taken over 1,000 independent trials of TS^i- The percent differences in Cq2 
for each initialization method relative to that obtained under the RNDgemi baseline are 
reported in Table 4. The worst-case deviation is less than 3% and the best improvement, 
obtained under WW, is only 2.79%. Further, all observed discrepancies can be attributed 
to sampling error in computation of Cqj and no differences were statistically significant even 
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Initialization Method 


FCFS 


LRM 


MWKR 


SPT 


LEX 


RNDactv 


RNDndly 


WW 


diopt-opt 


58.49 


97.41 


97.94 


64.97 


49.25 


64.68 


58.55 


53.10 


p for mean 
difference in 

d lopt-opt 

relative to 

RND,e,n, 


0.001 


0.001 


0.001 


0.126 


0.001 


0.001 


0.001 


0.001 


% mean 
difference in 

CQ2 relative 

to RNDsem, 


1.76 


2.32 


2.94 


1.55 


1.44 


1.07 


0.06 


-2.79 


p of mean 

difference in 

relative to 

Dsemi 


0.059 


0.084 


0.073 


0.077 


0.513 


0.573 


0.556 


0.309 



Table 4: The differences in both the mean distance to the nearest optimal solution {diopi-opt) 
and search cost (cqj) for various initialization methods on 10 x 10 random JSPs; 
differences are measured relative to random semi-active solutions {RNDsemi)- 



at p < 0.05. The data support the hypothesis predicted by the dynamic cost model: for 
difficult problems, available initialization methods for the JSP have no significant impact 
on the performance of TS^i . We conclude by observing that our results say nothing about 
the cost required for TS^i to locate optimal solutions to easy-to- moderate instances or 
sub-optimal solutions on a range of problem instances. In particular, we observe that 
alternative initialization methods may improve performance in these situations, due to the 
gradual increase of Vi associated with less difficult problem instances. Similarly, alternative 
initialization methods may benefit tabu search algorithms that employ re-intensification, 
such as those developed by Nowicki and Smutnicki. We have not investigated whether 
similar results hold on structured JSPs, principally because of the increased difficulty in 
computing both Cq2 and diopt-opt for these instances. 

12.2 The Specification of Tabu Tenure 

Empirically, the performance of tabu search depends heavily upon the choice of tabu tenure. 
Although "no single rule has been designed to yield an effective tenure for all classes of prob- 
lem" (Glover &; Laguna, 1997, p. 47), it is generally recognized that small tabu tenures lead 
to search stagnation, i.e., the inability to escape local optima, while large tabu tenures can 
yield significant deterioration in search effectiveness. Beyond these loose observations, prac- 
titioners have little guidance in selecting tabu tenures, and there is no theoretical justifica- 
tion for preferring any particular values within a range of apparently reasonable possibilities. 
In TS^i , a side-effect of short-term memory is to consistently bias search either toward or 
away from the nearest optimal solution. Intuitively, we would expect the magnitude of this 
bias to be proportional to the tabu tenure L; longer tenures should force search to make 
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more rapid progress away from previously visited regions of the search space. Consider the 
scenario in which TSni is steadily progressing away from the nearest optimal solution, such 
that the current distance to the nearest optimal solution is given by X dtabu-opt- For 
any fixed L, TSni will eventually invert the gradient and move search toward the nearest 
optimal solution. However, the larger the value of L, the more distant TSni is likely to 
move from the nearest optimal solution before inversion occurs. In terms of our dynamic 
model of TSni , this suggests that the maximal likely distance D^ax to the nearest optimal 
solution achieved by TSni is proportional to L. We have shown that problem difficulty in 
the JSP is largely a function of the effective search space size l-S^opt+l', of which Dmax is one 
measure. Consequently, we hypothesize that any increase in the tabu tenure translates into 
growth in \Siopi+\' and by inference problem difficulty. 

To test this hypothesis, we first examine the D^ax obtained using our sampling method- 
ology over a range of tabu tenures. We consider 10 x 10 random JSPs, specifically the 42 
instances with < 100, 000 optimal solutions. In TSni , the mean tabu tenure L is dictated 
by the interval [Lmin, Lmax]- Previously, we let [Lmim Lmax] = [8,14]; this particular value 
was empirically determined by Taillard to yield good performance on the ft 10 10 x 10 
benchmark instance. We now test the impact of both smaller and larger tabu tenure in- 
tervals on the performance of TSni ■ Based on extensive experimentation, we observe that 
[5, 10] approximates the smallest tenure interval for which TSni is empirically PAC, i.e., 
avoids becoming trapped in either local optima or restricted regions of the search space. 
On average, the D^ax obtained under the [8, 14] interval are roughly 6% greater than those 
obtained under the [5, 10] interval, while the Dmax obtained under a larger (arbitrarily cho- 
sen) [10, 18] interval are in turn roughly 5% greater than those observed under the [8, 14] 
interval; we attribute the non-uniform growth rate to differences in the mean tabu tenures 
under the [5, 10] and [8, 14] intervals versus the [8, 14] and [10, 16] intervals. Overall, these 
results confirm the intuition that larger tabu tenures lead to increased \Siopt-\-\', as measured 
by Dmax'-: we observe qualitatively identical changes in d 

tabu-opt- 

To confirm that changes in Dmax yield corresponding changes in problem difficulty, we 
compute the observed c under TSni for each problem instance over the three tabu tenure 
intervals. Here, we consider all 100 instances in our 10 x 10 problem set; the implicit 
assumption is that similar changes in Dmax hold for instances with > 100,000 optimal 
solutions. Scatter-plots of the resulting c for [5, 10] versus [8, 14] and [8, 14] versus [10, 18] 
tenure intervals are respectively shown in the left and right sides of Figure 13. The c under 
the medium interval [8, 14] are roughly 95% larger than those observed under the smaller 
[5, 10] interval, while the c obtained under the [10, 16] interval are roughly 60% greater than 
those obtained under the [8, 14] interval; again, we attribute the non-uniform growth rate to 
discrepancies in the difference in mean tabu tenure. We observe similar monotonic growth 
in problem difficulty for a limited sample of even larger tenure intervals. Overall, the results 
support our hypothesis that larger tabu tenures increase problem difficulty, specifically by 
inflating \Siopt+\'- Although not reported here, we additionally observe similar results on a 
small sample of 6 x 6 workflow and flowshop JSPs. 

Our experiments indicate that the tabu tenure L for TSni should be chosen as small as 
possible while simultaneously avoiding search stagnation. In addition to providing the first 
theoretically justified guideline for selecting a tabu tenure, this observation emphasizes the 
potentially detrimental nature of short-term memory. In particular, the results presented 
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Figure 13: Scatter-plots of the relative cost c required to locate an optimal solution under 
TSni for small versus moderate tabu tenures (left figure) and moderate versus 
large tabu tenures (right figure), for 10 x 10 random JSPs. 



above suggest that any amount of short-term memory in excess of that which is required 
to escape the attractor basins of local optima is lilcely to degrade the performance of TS^i ■ 



13. Implications and Conclusions 

Our results provide a significant first step toward developing an understanding of the dy- 
namics underlying tabu search. We have introduced a random walk model of Taillard's 
tabu search algorithm for the JSP that, despite its simplicity, accounts for nearly all of the 
variability in the cost required to locate optimal solutions to random JSPs. Additionally, 
the model accounts for similarly high proportions of the variability in the cost required to 
locate both sub-optimal solutions to random JSPs and optimal solutions to more structured 
JSPs. Our results indicate that search in Taillard's algorithm can be viewed as a variant 
of a straightforward one-dimensional random walk that exhibits two key types of bias: (1) 
a bias toward solutions that are roughly equi-distant from the nearest optimal solution 
and solutions that are maximally distant from the nearest optimal solution and (2) a bias 
that tends to maintain consistent progress either toward or away from the nearest opti- 
mal solution. In contrast to cost models of problem difficulty based on static search space 
features, the random walk model is scalable and provides direct insight into the dynam- 
ics of the search process. Additionally, we identified an unexpectedly strong link between 
the run-time dynamics of tabu search and simple features of the underlying search space, 
which provides an explanation for the initial success and ultimate failure of our earlier 
diopt-opt model of problem difficulty. Although we have not fully explored the predictive 
capabilities of the random walk model, two novel behaviors predicted by the model have 
been confirmed through experimentation on random JSPs: the failure of initial solutions 
to significantly impact algorithm performance and the potentially detrimental nature of 
short-term memory. 
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Despite the success of the random walk model, several issues remain unresolved. For 
example, it is unclear why TSj^i is imlikely to explore potentially large regions of Siopt+, 
i.e., why random local optima are not necessarily representative of solutions visited by TSni 
during search. Similarly, a causal explanation for the bias toward solutions that are roughly 
equi-distant from optimal solutions and solutions maximally distant from optimal solutions 
is lacking, although preliminary evidence indicates the bias is simply due to the choice of 
representation, i.e., the binary hypercube. 

Perhaps the most important contribution of the random walk model is the foundation 
it provides for future research. State-of-the-art tabu search algorithms for the JSP make 
extensive use of long-term memory (Nowicki & Smutnicki, 2005), and it is unclear how 
such memory will impact the structure of the random walk model. Moving beyond the JSP, 
there is the question of generalization: Do similar results hold when considering tabu search 
algorithms for other NP-haid problems, e.g., the quadratic assignment and permutation 
flow-shop scheduling problems? Although the model and associated methodology can be 
straightforwardly applied to other problems, representations, and even local search algo- 
rithms, it is unclear a priori whether we can expect sufficient regularities in the resulting 
transition probabilities to yield accurate predictions. This is especially true for problems in 
which highly structured benchmarks are more prevalent, e.g., in SAT. Finally, the random 
walk model is. at least currently, largely of only a posteriori use. It is unclear how such a 
model might be leveraged in order to develop improved tabu search algorithms. For exam- 
ple, although it is clear that the bias toward solutions that are distant from optimal solutions 
should be minimized, it is far from obvious how this can be achieved. Similarly, another 
potential application of the random walk model involves predicting problem difficulty; now 
that the dominant factors influencing problem difficulty in the JSP are becoming better 
understood, an obvious next step is to analyze whether it is possible to achieve accurate 
estimates of these quantities with minimal or moderate computational effort. 

The objective of our research was to "demystify" the behavior of tabu search algorithms, 
using the JSP as a testbed. In this goal, we have succeeded. Our random walk model 
captures the run-time dynamics of tabu search for the JSP accounts for the primary behavior 
of interest: the cost required to locate optimal solutions to problem instances. The power of 
the model is further illustrated by its ability to account for additional behavioral phenomena 
and correctly predict novel behaviors. Through careful modeling and analysis we have 
demonstrated that despite their effectiveness, tabu search algorithms for the JSP are in fact 
quite simple in their operation. The random walk model should serve as a useful basis for 
exploring similar issues in the context of both more advanced tabu search algorithms for the 
JSP and tabu search algorithms for other iVP-hard combinatorial optimization problems. 
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